12 results found.
Written
Corpus,
Language Type:
Monolingual
Languages:
Uzbek
Availability:
From Data Center(s)
License:
Size:
4 GByte Production Status:
Existing-used
Use:
Morphological Analysis
-
Paper title:Morphological Segmentation for Low Resource Languages
-
Paper track:Written/poster presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Justin Mott | BOLT LRL Uzbek representative language pack v1.0 | /N |
Documentation:
None
Written
Dictionary,
Language Type:
Multilingual
Languages:
Azerbaijani Kazakh Kirghiz Turkish Uzbek
Availability:
Freely Available
License:
OpenSource
Size:
5 languages OtherProduction Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Cross-Lingual Word Embeddings for Turkic Languages
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Elmurod Kuriyozov | Bilingual dictionaries for Turkic Languages | /N |
Documentation:
None
Written
Word embeddings,
Language Type:
Monolingual
Languages:
Uzbek
Availability:
Freely Available
License:
OpenSource
Size:
523.3 MByte Production Status:
Newly created-finished
Use:
Knowledge Discovery/Representation
-
Paper title:Cross-Lingual Word Embeddings for Turkic Languages
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Elmurod Kuriyozov | Word embeddings for Uzbek | /N |
Documentation:
None
Written
Corpus,
Language Type:
Monolingual
Languages:
Arabic Azerbaijani Belarusian Bulgarian Catalan Danish English Estonian Filipino Finnish Hindi Hungarian Indonesian Irish Italian Japanese Kazakh Korean Latvian Lithuanian Mongolian Norwegian Polish Portuguese Russian Serbian (Latin) Slovenian Spanish Swedish Tamil Turkish Ukrainian Urdu Uzbek Vietnamese ces deu ell fas fra isl kat mkd nld ron slk sqi zho
Availability:
Freely Available
License:
GNU-GPL v.3
Size:
45 billion words Production Status:
Newly created-finished
Use:
Corpus Creation/Annotation
-
Paper title:Geographically-Balanced Gigaword Corpora for 50 Language Varieties
-
Paper track:Written/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Jonathan Dunn | GeoWAC | /N |
Documentation:
https://github.com/jonathandunn/earthlings
Written
Corpus,
Language Type:
Monolingual
Languages:
Adyghe Ancient Greek Anglo-Norman Arabic Asturian Azerbaijani Bangla Bashkir Belarusian Breton Bulgarian Catalan Central Kurdish Church Slavic Classical Armenian Classical Syriac Cornish Crimean Tatar Danish English Estonian Faroese Finnish Friulian Galolen Gothic Haida Hebrew Hindi Hungarian Ingrian Irish Italian Kabardian Kalaallisut Kannada Karelian Kashubian Kazakh Khakas Khaling Ladin Latin Latvian Lithuanian Livonian Livvi Low German Lower Sorbian Ludian Maltese Manx Mapuche Middle French Middle High German Middle Low German Murrinh-Patha Navajo Neapolitan No linguistic content Northern Frisian Northern Kurdish Northern Sami Norwegian Bokmål Norwegian Nynorsk Occitan Old English Old French Old Irish Old Saxon Pashto Polish Portuguese Quechua Russian Sanskrit Scottish Gaelic Serbian (Latin) Slovenian Spanish Swahili (Congo - Kinshasa) Swedish Tajik Tatar Telugu Turkish Turkmen Ukrainian Urdu Uzbek Venetian Veps Votic Western Frisian Yiddish Zulu bod ces cym deu ell eus fas fra hye isl kat mkd nld ron sqi
Availability:
Freely Available
License:
CC BY-SA 3.0
Size:
None Production Status:
Existing-updated
Use:
Morphological Analysis
-
Paper title:UniMorph 3.0: Universal Morphology
-
Paper track:Infrastructural Issues/Large Projects/oral presentation
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Ekaterina Vylomova | UniMorph | /N |
Documentation:
None
Written
Corpus,
Language Type:
Multilingual
Languages:
Afrikaans Albanian Amharic Arabic Aragonese Armenian Assamese Azerbaijani Basque Belarusian Bengali Bosnian Breton Bulgarian Burmese Catalan Central Khmer Chinese Croatian Czech Danish Dutch Dzongkha English Esperanto Estonian Finnish French Gaelic Galician Georgian German Greek Gujarati Hausa Hebrew Hindi Hungarian Icelandic Igbo Indonesian Irish Italian Japanese Kannada Kazakh Kinyarwanda Korean Kurdish Kyrgyz Latvian Limburgan Lithuanian Macedonian Malagasy Malay Malayalam Maltese Marathi Mongolian Nepali Northern Sami Norwegian Norwegian Bokmål Norwegian Nynorsk Occitan Oriya Panjabi Pashto Persian Polish Portuguese Romanian Russian Serbian Serbo-Croatian Sinhala Slovak Slovenian Spanish Swedish Tajik Tamil Tatar Telugu Thai Turkish Turkmen Uighur Ukrainian Urdu Uzbek Vietnamese Walloon Welsh Western Frisian Xhosa Yiddish Yoruba Zulu
Availability:
Freely Available
License:
Size:
55 million sentences Production Status:
Existing-used
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Improving Massively Multilingual Neural Machine Translation and Zero-Shot Translation
-
Paper track:Long/Machine Translation
-
Paper status:Accept
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Biao Zhang | the open parallel corpus (OPUS) | /N |
Documentation:
None
Speech
Corpus,
Language Type:
Monolingual
Languages:
Arabic Bengali Central Khmer Chinese Dari Egyptian Arabic English Georgian Hindi Iranian Persian Italian Japanese Korean Lao Mandarin Chinese Min Nan Chinese Moroccan Arabic Northern Khmer Panjabi Persian Russian Spanish Tagalog Thai Tigrinya Urdu Uzbek Vietnamese Wu Chinese Yue Chinese
Availability:
From Data Center(s)
License:
LDC
Size:
None Production Status:
Existing-used
Use:
Speech Recognition/Understanding
-
Paper title:End-to-End Neural Speaker Diarization with Permutation-Free Objectives
-
Paper track:4.5 Speaker diarization/Poster Presentation
-
Paper status:Accept - Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Main Contact | Yusuke Fujita | 2008 NIST Speaker Recognition Evaluation | /N |
Documentation:
None
<Not Specified>
Corpus,
Language Type:
Multilingual
Languages:
Turkish Uzbek
Availability:
<Not Specified>
License:
<Not Specified>
Size:
<Not Specified> <Not Specified>Production Status:
Existing-used
Use:
Information Extraction, Information Retrieval
-
Paper title:Bitext Name Tagging for Cross-lingual Entity Annotation Projection
-
Paper track:Under-resourced Languages
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country |
|---|---|---|---|
| Author 1 | Dongxu Zhang | Beijing University of Posts and Telecommunications | CN |
| Author 2 | Boliang Zhang | Rensselaer Polytechnic Institue | US |
| Author 3 | Xiaoman Pan | Rensselaer Polytechnic Institute | US |
| Author 4 | Xiaocheng Feng | Harbin Institute of Technology,SCIR lab | CN |
| Author 5 | Heng Ji | Rensselaer Polytechnic Institute | US |
| Author 6 | Weiran XU | Beijing University of Posts and Telecommunications | CN |
| Main Contact | Dongxu Zhang | Beijing University of Posts and Telecommunications | None |
Documentation:
<Not Specified>Language Type:
Trilingual
Languages:
Hausa Turkish Uzbek
Availability:
From Data Center(s)
License:
available in 2016
Size:
100M <Not Specified>Production Status:
Newly created-finished
Use:
Corpus Creation/Annotation
-
Paper title:Rapid Development of Morphological Analyzers for Typologically Diverse Languages
-
Paper track:Infrastructural Issues/Large Projects
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country | ||
|---|---|---|---|---|---|
| Author 1 | Seth Kulick | Linguistic Data Consortium, University of Pennsylvania | US | ||
| Author 2 | Ann Bies | Linguistic Data Consortium, University of Pennsylvania | US | Linguistic Data Consortium | US |
| Main Contact | Seth Kulick | Linguistic Data Consortium, University of Pennsylvania | None |
Documentation:
<Not Specified>Language Type:
Multilingual
Languages:
English Uzbek
Availability:
the corpus will be published via LDC general publication catalog
License:
<Not Specified>
Size:
9080 words Production Status:
Newly created-finished
Use:
Machine Translation, SpeechToSpeech Translation
-
Paper title:Uzbek-English and Turkish-English Morpheme Alignment Corpora
-
Paper track:Written
-
Paper status:Accept Poster
| Author Number | Name | Affiliation | Country | ||||
|---|---|---|---|---|---|---|---|
| Author 1 | Xuansong Li | Linguistic Data Consortium, University of Pennsylvania | US | ||||
| Author 2 | Jennifer Tracey | Linguistic Data Consortium, University of Pennsylvania | US | ||||
| Author 3 | Stephen Grimes | <Not Specified> | None | Linguistic Data Consortium, University of Pennsylvania | US | University of Pennsylvania | None |
| Author 4 | Stephanie Strassel | LDC | None | Linguistic Data Consortium, University of Pennsylvania | US | ||
| Main Contact | Xuansong Li | Linguistic Data Consortium, University of Pennsylvania | None |
Documentation:
<Not Specified>




